15 research outputs found

    NP Animacy Identification for Anaphora Resolution

    Get PDF
    In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision

    The evaluation of liver fibrosis regression in chronic hepatitis C patients after the treatment with direct-acting antiviral agents – A review of the literature

    Get PDF
    The second-generation of direct-acting antiviral agents are the current treatment for chronic viral hepatitis C infection. To evaluate the regression of liver fibrosis in patients receiving this therapy, liver biopsy remains the most accurate method, but the invasiveness of this procedure is its major drawback. Different non-invasive tests have been used to study changes in the stage of liver fibrosis in patients with chronic viral hepatitis treated with the second-generation of direct-acting antiviral agents: liver stiffness measurements (with transient elastography or acoustic radiation force impulse elastography) or different scores that use serum markers to calculate a fibrosis score. We prepared a literature review of the available data regarding the long-term evolution of liver fibrosis after the treatment with direct-acting antiviral agents for chronic viral hepatitis C

    CANELC: constructing an e-language corpus

    Get PDF
    This paper reports on the construction of CANELC: the Cambridge and Nottingham e-language Corpus.3 CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus). 3 CANELC stands for Cambridge and Nottingham e-language Corpus. This corpus has been built as part of a collaborative project between The University of Nottingham and Cambridge University Press with whom sole copyright of the annotated corpus resides. CANELC comprises one-million words of digital English taken from SMS messages, blogs, tweets, discussion board content and private/business emails. Plans to extend the corpus are under discussion. The legal dimension to corpus ‘ownership’ of some forms of unannotated data is a complex one and is under constant review. At the present time the annotated corpus is only available to authors and researchers working for CUP and is not more generally available

    Measuring text simplification with the crowd

    Full text link
    Text can often be complex and difficult to read, especially for peo ple with cognitive impairments or low literacy skills. Text simplifi cation is a process that reduces the complexity of both wording and structure in a sentence, while retaining its meaning. However, this is currently a challenging task for machines, and thus, providing effective on-demand text simplification to those who need it re mains an unsolved problem. Even evaluating the simplicity of text remains a challenging problem for both computers, which cannot understand the meaning of text, and humans, who often struggle to agree on what constitutes a good simplification. This paper focuses on the evaluation of English text simplifica tion using the crowd. We show that leveraging crowds can result in a collective decision that is accurate and converges to a consen sus rating. Our results from 2,500 crowd annotations show that the crowd can effectively rate levels of simplicity. This may allow sim plification systems and system builders to get better feedback about how well content is being simplified, as compared to standard mea sures which classify content into ‘simplified ’ or ‘not simplified’ categories. Our study provides evidence that the crowd could be used to evaluate English text simplification, as well as to create simplified text in future work

    Are decision trees a feasible knowledge representation to guide extraction of critical information from randomized controlled trial reports?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>This paper proposes the use of decision trees as the basis for automatically extracting information from published randomized controlled trial (RCT) reports. An exploratory analysis of RCT abstracts is undertaken to investigate the feasibility of using decision trees as a semantic structure. Quality-of-paper measures are also examined.</p> <p>Methods</p> <p>A subset of 455 abstracts (randomly selected from a set of 7620 retrieved from Medline from 1998 – 2006) are examined for the quality of RCT reporting, the identifiability of RCTs from abstracts, and the completeness and complexity of RCT abstracts with respect to key decision tree elements. Abstracts were manually assigned to 6 sub-groups distinguishing whether they were primary RCTs versus other design types. For primary RCT studies, we analyzed and annotated the reporting of intervention comparison, population assignment and outcome values. To measure completeness, the frequencies by which complete intervention, population and outcome information are reported in abstracts were measured. A qualitative examination of the reporting language was conducted.</p> <p>Results</p> <p>Decision tree elements are manually identifiable in the majority of primary RCT abstracts. 73.8% of a random subset was primary studies with a single population assigned to two or more interventions. 68% of these primary RCT abstracts were structured. 63% contained pharmaceutical interventions. 84% reported the total number of study subjects. In a subset of 21 abstracts examined, 71% reported numerical outcome values.</p> <p>Conclusion</p> <p>The manual identifiability of decision tree elements in the abstract suggests that decision trees could be a suitable construct to guide machine summarisation of RCTs. The presence of decision tree elements could also act as an indicator for RCT report quality in terms of completeness and uniformity.</p

    Comparing pronoun resolution algorithms

    No full text
    This paper discusses the comparative evaluation of five well-known pronoun resolution algorithms conducted with the help of a purpose-built tool for consistent evaluation in anaphora resolution, termed the evaluation workbench. The workbench enables the evaluation and comparison of pronoun resolution algorithms on the basis of the same preprocessing tools and test data. The tool is controlled by the user who can conduct the evaluation according to a variety of parameters, with regard to the types of anaphors and the samples used for evaluation. The extensive comparative evaluation of the pronoun resolution algorithms showed that their performance was significantly lower than the figures reported in the original papers describing the algorithms. The evaluation study concluded that the main reason for this drop in performance is the fact that all algorithms operate in a fully automatic mode

    The QALL-ME Framework: A Specifiable-Domain Multilingual Question Answering Architecture.

    No full text
    This paper presents the QALL-ME Framework, a reusable architecture for building multilingual Question Answering (QA) systems working on structured data. The framework is released as free open source software with a set of demo components and extensive documentation. As main characteristics of the QALL-ME Framework we point out: (i) the framework domain portability, achieved by an ontology modelling of the target domain; (ii) the context awareness regarding space and time of the question; (iii) the use of textual entailment engines as the core of the question interpretation; and (iv) the framework’s Service Oriented Architecture, which is realized using interchangeable web services. Furthermore we present a running example to clarify how the framework processes questions as well as a case study that successfully shows a QA application built with the QALL-ME Framework for cinema/movie events in the tourism domain

    Sentence retrieval for abstracts of randomized controlled trials

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The practice of evidence-based medicine (EBM) requires clinicians to integrate their expertise with the latest scientific research. But this is becoming increasingly difficult with the growing numbers of published articles. There is a clear need for better tools to improve clinician's ability to search the primary literature. Randomized clinical trials (RCTs) are the most reliable source of evidence documenting the efficacy of treatment options. This paper describes the retrieval of key sentences from abstracts of RCTs as a step towards helping users find relevant facts about the experimental design of clinical studies.</p> <p>Method</p> <p>Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically categorized. This is done by extending a previous approach for labeling sentences in an abstract for general categories associated with scientific argumentation or rhetorical roles: Aim, Method, Results and Conclusion. Methods are tested on several corpora of RCT abstracts. First structured abstracts with headings specifically indicating <it>Intervention</it>, <it>Participant </it>and <it>Outcome Measures </it>are used. Also a manually annotated corpus of structured and unstructured abstracts is prepared for testing a classifier that identifies sentences belonging to each category.</p> <p>Results</p> <p>Using CRFs, sentences can be labeled for the four rhetorical roles with <it>F</it>-scores from 0.93–0.98. This outperforms the use of Support Vector Machines. Furthermore, sentences can be automatically labeled for <it>Intervention</it>, <it>Participant </it>and <it>Outcome Measures</it>, in unstructured and structured abstracts where the section headings do not specifically indicate these three topics. <it>F</it>-scores of up to 0.83 and 0.84 are obtained for <it>Intervention </it>and <it>Outcome Measure </it>sentences.</p> <p>Conclusion</p> <p>Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports. This is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.</p
    corecore